Self-Regulating Action Exploration in Reinforcement Learning

نویسندگان

  • Teck-Hou Teng
  • Ah-Hwee Tan
  • Yuan-Sin Tan
چکیده

The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration leading to convergence efficiently. The idea originates from an intuitive understanding that exploration is only necessary when the success rate is low. This means the rate of exploration should be conducted in inverse proportion to the rate of success. In addition, the change in exploration-exploitation rates alters the duration of the learning process. Using this approach, the duration of the learning process becomes adaptive to the updated status of the learning process. Experimental results from the K-Armed Bandit and Air Combat Maneuver scenario prove that optimal action policies can be discovered using the right amount of training iterations. In essence, the proposed method eliminates the guesswork on the amount of exploration needed during reinforcement learning. c © 2012 The Authors. Published by Elsevier B.V. Selection and/or peer-review under responsibility of the Program Committee of INNS-WC 2012.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Action Inhibition

An explicit exploration strategy is necessary in reinforcement learning (RL) to balance the need to reduce the uncertainty associated with the expected outcome of an action and the need to converge to a solution. This dependency is more acute in on-policy reinforcement learning where the exploration guides the search for an optimal solution. The need for a self-regulating exploration is manifes...

متن کامل

پنج عامل بزرگ شخصیت، راهبردهای یادگیری خودتنظیمی و موفقیت علمی دانش‌پژوهان جوان

Introduction: The concept of scientific success is attended to exploration of educational condition with emphasis on attitude, personality characters, and self-regulating learning strategies. The aim of this research was to study the personality characteristics, self-regulating learning strategies and scientific success of students. Method: In this descriptive correlation study, 283(206 boys...

متن کامل

How an Adaptive Learning Rate Benefits Neuro-Fuzzy Reinforcement Learning Systems

To acquire adaptive behaviors of multiple agents in the unknown environment, several neuro-fuzzy reinforcement learning systems (NFRLSs) have been proposed Kuremoto et al. Meanwhile, to manage the balance between exploration and exploitation in fuzzy reinforcement learning (FRL), an adaptive learning rate (ALR), which adjusting learning rate by considering “fuzzy visit value” of the current sta...

متن کامل

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as -greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. ...

متن کامل

The Investigation on the Relations between Epistemological Beliefs and Self-Regulating Learning Strategies among Students in North Khorasan University of Medical Sciences in 2017

Introduction: Poor epistemological beliefs are among the reasons which will lead a person to tiredness, lack of motivation and distrust to personal abilities. This is usually resulted from the fact that individuals with such beliefs image the knowledge as unrelated and confusing elements which surrounded by references and they will never try for learning it. In the recent years, there are high ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012